USA Restaurants Data Visualization project
Akash Gangadharan, Sandhya Iyer, Himani Borana
About the Dataset¶
Uber Eats An online food ordering and delivery platform launched by Uber in 2014. Users can read menus, reviews, ratings, order, and pay for food from participating restaurants using an application on the iOS or Android platforms or through a web browser. Users are also able to tip for delivery. Payment is charged to a card on file with Uber. Meals are delivered by couriers using cars, scooters, bikes, or foot. It is operational in over 6,000 cities across 45 countries.
- This dataset contains lists of Restaurants in the USA that are partnered with Uber Eats. Data was collected via web scraping using Python libraries
- 63k+ USA restaurants and 5 million+ menus from Uber Eats
- You can get the dataset from the link here.
Data Description¶
For our Exploratory Data Analysis, we have considered only
Restaurants.csv data
Project Description¶
In [43]:
ls
Volume in drive C has no label.
Volume Serial Number is 06B6-21E0
Directory of C:\Users\New User\Downloads\Compressed\uber_data
12/06/2023 06:32 PM <DIR> .
12/06/2023 05:59 PM <DIR> ..
12/01/2023 05:32 PM <DIR> .ipynb_checkpoints
11/29/2023 11:03 PM 308,372,008 df_restaurant_menu1.parquet
11/29/2023 11:44 PM 4,411,638 df_restaurants1.parquet
12/03/2023 03:06 AM 230,223 Group15_Phase3_PowerPoint.pptx
11/29/2023 07:12 PM 890,192,204 restaurant_menu.csv
11/22/2023 06:17 AM 870,834,478 restaurant-menus.csv
11/22/2023 06:18 AM 10,000,371 restaurants.csv
12/05/2023 10:18 PM 16,692,214 restaurants_map.html
11/29/2023 02:33 PM 4,937 test.csv
12/06/2023 06:32 PM 27,214,766 Uber_data_viz.ipynb
9 File(s) 2,127,952,839 bytes
3 Dir(s) 366,317,658,112 bytes free
In [58]:
%%HTML
<script src="require.js"></script>
In [59]:
import plotly.offline as py
py.init_notebook_mode(connected=True)
In [56]:
# Importing important libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.basemap import Basemap
import matplotlib.cm as cm
import folium
from uszipcode import SearchEngine
import itertools
import plotly.express as px
from IPython.display import HTML
YOUR_MAPBOX_ACCESS_TOKEN ='pk.eyJ1IjoiYWtnYW5nYWRoYXJhbiIsImEiOiJjbHB1ajBleDQwbGIyMnFvZ3l3NTVzMDlwIn0.maV6Dk8vUA9Qg7zYk1MQvQ'
In [3]:
df_restaurants = pd.read_csv('restaurants.csv')
df_restaurants.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 63469 entries, 0 to 63468 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 63469 non-null int64 1 position 63469 non-null int64 2 name 63469 non-null object 3 score 35302 non-null float64 4 ratings 35302 non-null float64 5 category 63384 non-null object 6 price_range 52852 non-null object 7 full_address 63016 non-null object 8 zip_code 62952 non-null object 9 lat 63469 non-null float64 10 lng 63469 non-null float64 dtypes: float64(4), int64(2), object(5) memory usage: 5.3+ MB
In [4]:
df_restaurants.shape
Out[4]:
(63469, 11)
In [5]:
# lets treat the restaurants data
df_restaurants = df_restaurants.rename(columns={'name': 'restaurant_name','lng': 'long'})
# Function to sort and standardize the category strings
def standardize_category(category):
if not isinstance(category, str):
return category # Return as is if not a string
category = category.replace('burgers', 'Burger').replace('burger', 'Burgers').replace('&', 'and')
categories = category.split(', ')
categories.sort()
return ', '.join(categories)
# Apply the function to the category column
df_restaurants['standardized_category'] = df_restaurants['category'].apply(standardize_category)
df_restaurants.drop(columns=['category', 'full_address'], inplace=True)
df_restaurants = df_restaurants.rename(columns={'standardized_category': 'category'})
# Replace values in 'price_range' column
df_restaurants['price_range'] = df_restaurants['price_range'].replace({'$': 'Inexpensive', '$$': 'Moderately Priced', '$$$': 'Expensive', '$$$$': 'Very Expensive'})
df_restaurants = df_restaurants.dropna(subset=['score','zip_code', 'price_range','category'])
df_restaurants.drop_duplicates(inplace=True)
missing_values = df_restaurants.isnull().sum()
print(missing_values)
id 0 position 0 restaurant_name 0 score 0 ratings 0 price_range 0 zip_code 0 lat 0 long 0 category 0 dtype: int64
In [6]:
df_restaurants.head(5)
Out[6]:
| id | position | restaurant_name | score | ratings | price_range | zip_code | lat | long | category | |
|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 7 | 27 | Jinsei Sushi | 4.7 | 63.0 | Inexpensive | 35209 | 33.480440 | -86.790440 | Asian, Japanese, Sushi |
| 13 | 14 | 51 | Panera (521 Fieldstown Road) | 4.6 | 44.0 | Inexpensive | 35071 | 33.651407 | -86.819247 | American, Breakfast and Brunch, Chicken, Famil... |
| 15 | 16 | 88 | Jeni's Splendid Ice Cream (Pepper Place) | 5.0 | 20.0 | Expensive | 35233 | 33.516600 | -86.789950 | Comfort Food, Desserts, Ice Cream and Frozen Y... |
| 18 | 19 | 30 | Falafel Cafe | 4.9 | 48.0 | Inexpensive | 35233 | 33.508353 | -86.803170 | Greek, Healthy, Mediterranean, Middle Eastern,... |
| 19 | 20 | 40 | MrBeast Burger (838 Odum Road) | 3.7 | 19.0 | Moderately Priced | 35071 | 33.645480 | -86.826260 | American, Burgers, Sandwich |
In [7]:
search = SearchEngine()
def get_state_from_zip(zip_code):
zipcode = search.by_zipcode(zip_code)
if zipcode:
return zipcode.state
else:
return None
# Function to get city from zip code
def get_city_from_zip(zip_code):
zipcode_info = search.by_zipcode(zip_code)
return zipcode_info.major_city if zipcode_info else None
df_restaurants['state'] = df_restaurants['zip_code'].apply(get_state_from_zip)
df_restaurants['city'] = df_restaurants['zip_code'].apply(get_city_from_zip)
In [8]:
df_restaurants.head()
Out[8]:
| id | position | restaurant_name | score | ratings | price_range | zip_code | lat | long | category | state | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 7 | 27 | Jinsei Sushi | 4.7 | 63.0 | Inexpensive | 35209 | 33.480440 | -86.790440 | Asian, Japanese, Sushi | AL | Birmingham |
| 13 | 14 | 51 | Panera (521 Fieldstown Road) | 4.6 | 44.0 | Inexpensive | 35071 | 33.651407 | -86.819247 | American, Breakfast and Brunch, Chicken, Famil... | AL | Gardendale |
| 15 | 16 | 88 | Jeni's Splendid Ice Cream (Pepper Place) | 5.0 | 20.0 | Expensive | 35233 | 33.516600 | -86.789950 | Comfort Food, Desserts, Ice Cream and Frozen Y... | AL | Birmingham |
| 18 | 19 | 30 | Falafel Cafe | 4.9 | 48.0 | Inexpensive | 35233 | 33.508353 | -86.803170 | Greek, Healthy, Mediterranean, Middle Eastern,... | AL | Birmingham |
| 19 | 20 | 40 | MrBeast Burger (838 Odum Road) | 3.7 | 19.0 | Moderately Priced | 35071 | 33.645480 | -86.826260 | American, Burgers, Sandwich | AL | Gardendale |
In [9]:
# Set up the basemap
plt.figure(figsize=(12, 8))
m = Basemap(projection='merc', llcrnrlat=min(df_restaurants['lat']), urcrnrlat=max(df_restaurants['lat']),
llcrnrlon=min(df_restaurants['long']), urcrnrlon=max(df_restaurants['long']), lat_ts=20, resolution='c')
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='lightgray', lake_color='white')
m.drawmapboundary(fill_color='white')
# Plot each restaurant location
x, y = m(df_restaurants['long'].values, df_restaurants['lat'].values)
m.scatter(x, y, s=10, color='red', marker='o', alpha=0.2)
plt.title('Distribution of Restaurants by Geography')
plt.show()
The above map shows the restaurant distribution.
As you can see the above map is not very clear. Hence we would create another map using a different python package.
In [10]:
# Calculate average latitude and longitude for map initialization
average_latitude = df_restaurants['lat'].mean()
average_longitude = df_restaurants['long'].mean()
# Create a Folium map
m = folium.Map(location=[average_latitude, average_longitude], zoom_start=12)
# Add markers for each restaurant
for _, row in df_restaurants.iterrows():
folium.CircleMarker(
location=[row['lat'], row['long']],
radius=2, # Size of the circle marker
color='red',
fill=True,
fill_opacity=0.2
).add_to(m)
# Display the map
m
Out[10]:
Make this Notebook Trusted to load map: File -> Trust Notebook